Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction
نویسندگان
چکیده
One of the main applications of microarray technology is to determine the gene expression profiles of diseases and disease treatments. This is typically done by selecting a small number of genes from amongst thousands to tens of thousands, whose expression values are collectively used as classification profiles. This gene selection process is notoriously challenging because microarray data normally contains only a very small number of samples, but range over thousands to tens of thousands of genes. Most existing gene selection methods carefully define a function to score the differential levels of gene expression under a variety of conditions, in order to identify top-ranked genes. Such single gene scoring methods suffer because some selected genes have very similar expression patterns so using them all in classification is largely redundant. Furthermore, these selected genes can prevent the consideration of other individually-less but collectively-more differentially expressed genes. We propose to cluster genes in terms of their class discrimination strength and to limit the number of selected genes per cluster. By combining this idea with several existing single gene scoring methods, we show by experiments on two cancer microarray datasets that our methods identify gene subsets which collectively have significantly higher classification accuracies.
منابع مشابه
Gene Selection for Multi-Class Prediction of Microarray Data
Gene expression data from microarrays have been successfully applied to class prediction, where the purpose is to classify and predict the diagnostic category of a sample by its gene expression profile. A typical microarray dataset consists of expression levels for a large number of genes on a relatively small number of samples. As a consequence, one basic and important question associated with...
متن کاملEffect of Temperature and Time on the Joint Properties of AISI420 Steel to SAF2507 Steel Produced by Transient Liquid Phase Process
In this research, the effect of temperature and time on the properties of AISI420/SAF2507 dissimilar joint produced by transient liquid phase bonding process was investigated. A BNi-2 interlayer with 25 μm thickness was inserted between two dissimilar steel samples. The bonding process was performed at 1050 oC and 1100 oC for different bonding times. The microstructures of the joints were studi...
متن کاملEquivalence class formation via identity matching to sample and simple discrimination with class-specific consequences
Human participant performances often show evidence of learning untrained relations when conditional discrimination training between physically dissimilar stimuli is conducted. These emergent relations document equivalence class formation. The current study investigated whether class-specific consequences (i.e. the specific reinforcers used for each potential class during training) also join the...
متن کاملDissimilar resistance spot welding of AISI 1075 eutectoid steel to AISI 201 stainless steel
In this paper, dissimilar resistance spot welding of AISI 1075 eutectoid steel to AISI 201 stainless steel is investigated experimentally. For this purpose, the experiments are designed using response surface methodology and based on four-factor, five-level central composite design. The effects of process parameters such as welding current, welding time, cooling time and electrode force are inv...
متن کاملComparison of Bayesian and Frequentist Methods in Estimating the Net Reclassification and Integrated Discrimination Improvement Indices for Evaluation of Prediction Models: Tehran Lipid and Glucose Study
Introduction: The Frequency-based method is commonly used to estimate the Net Reclassification Improvement (NRI)- and Integrated Discrimination Improvement (IDI) indices. These indices measure the magnitude of the performance of statistical models when a new biomarker is added. This method has poor performance in some cases, especially in small samples. In this study, the performance of two Bay...
متن کامل